OCPBUGS-85426: improve DCM tests to reduce flakiness#31173
Conversation
Addressing some flakes in the DCM e2e tests: Race patching endpoints resource: add a polling to ensure the endpoints resource was created. Also, dump all endpoints and endpointSlice in the test namespace in case of a test failure. Timeout calling the router: increase the timeout, reading all the fast ones from a single const. Also, dumping router logs which should help to correlate failures when calling its service. Router container not found: router pod should be reporting container as running and healthy, but it was not running. Attaching the router log, which should help to explain why router container was not found. Example of testing deployment state log, added whenever it is scaled: ``` deployment state: replicas=4 pods=route-scale-in-5b7b4f6b8c-9ncjg/Running/10.128.1.27 // route-scale-in-5b7b4f6b8c-ck45n/Running/10.128.1.28 // route-scale-in-5b7b4f6b8c-srffr/Running/10.128.1.29 // route-scale-in-5b7b4f6b8c-v9hm2/Running/10.128.1.26 ``` Example of Endpoints and EndpointSlice resources listed if the test fails: ``` Endpoints: NAME ADDRESSES NOT READY ADDRESSES PORTS route-scale-in 10.128.1.26,10.128.1.27,10.128.1.28,10.128.1.29 9376 route-scale-in-khbh5 10.128.1.26 9376 EndpointSlices: NAME SERVICE ADDRESSES NOT READY ADDRESSES PORTS route-scale-in-khbh5-8ncmh route-scale-in-khbh5 10.128.1.26 9376 route-scale-in-rrhzv route-scale-in 10.128.1.27,10.128.1.28,10.128.1.29,10.128.1.26 9376 ``` https://redhat.atlassian.net/browse/OCPBUGS-85426
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@jcmoraisjr: This pull request references Jira Issue OCPBUGS-85426, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
WalkthroughThe PR enhances the router e2e test suite with improved observability and robustness. It introduces a configurable timeout constant, implements polling for endpoint fetching, adds deployment state logging helpers, extends failure diagnostics to include route and endpoint information, and refactors router pod targeting. ChangesRouter Test Suite Observability and Stability Enhancements
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcmoraisjr The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jcmoraisjr: This pull request references Jira Issue OCPBUGS-85426, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Scheduling required tests: |
|
@jcmoraisjr: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Addressing some flakes in the DCM e2e tests:
Example of testing deployment state log, added whenever it is scaled:
Example of Endpoints and EndpointSlice resources listed if the test fails:
https://redhat.atlassian.net/browse/OCPBUGS-85426
Summary by CodeRabbit